Supplemental Material for “ A Practical Algorithm for Topic Modeling with Provable Guarantees ”

نویسندگان

  • Sanjeev Arora
  • Yoni Halpern
  • Yichen Wu
چکیده

(a) (b) (c) (d) Figure 1. Illustration of the Algorithm Recall that the correctness of the algorithm depends on the following Lemma: Lemma 1.1. The point d j found by the algorithm must be δ = O(/γ 2) close to some vertex v i. In particular , the corresponding a j O(/γ 2)-covers v i. In order to prove this Lemma, we first show that even if previously found vertices are only δ close to some vertices, there is still another vertex that is far from the span of previously found vertices. Lemma 1.2. Suppose all previously found vertices are O(/γ 2) close to distinct vertices, there is a vertex v i whose distance from span(S) is at least γ/2. In order to prove Lemma 1.2, we use a volume argument. First we show that the volume of a robust simplex cannot change by too much when the vertices are perturbed. } are the vertices of a γ-robust simplex S. Let S be a simplex with ver-tices {v √ Kδ < γ the volume of the two simplices satisfy vol(S)(1 − 2δ/γ) K−1 ≤ vol(S) ≤ vol(S)(1 + 4δ/γ) K−1. Proof: As the volume of a simplex is proportional to the determinant of a matrix whose columns are the edges of the simplex, we first show the following perturbation bound for determinant.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Practical Algorithm for Topic Modeling with Provable Guarantees

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but th...

متن کامل

From Correlation to Hierarchy: Practical Topic Modeling via Spectral Inference

Topic models were originally applied in text analysis for extracting high-level themes from documents, but they work equally well in any setting where users select items from an inventory. Recent work in spectral topic modeling has provided algorithms that operate only on easily-collected summary statistics, rather than exhaustively iterating over the full dataset. The “anchor word” algorithms ...

متن کامل

A Topic Modeling Approach to Rank Aggregation

We propose a new model for rank aggregation from pairwise comparisons that captures both ranking heterogeneity across users and ranking inconsistency for each user. We establish a formal statistical equivalence between the new model and topic models. We leverage recent advances in the topic modeling literature to develop an algorithm that can learn shared latent rankings with provable statistic...

متن کامل

A Topic Modeling Approach to Ranking

We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population of users. This new model also captures inconsistent user behavior in a natural way. We show how the estimation of latent rankings in the new generative model ...

متن کامل

Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models . Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger condi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013